TSM2X: High-performance tall-and-skinny matrix–matrix multiplication on GPUs
نویسندگان
چکیده
Linear algebra operations have been widely used in big data analytics and scientific computations. Many works done on optimizing linear GPUs with regular-shaped input. However, few focus fully utilizing GPU resources when the input is not regular-shaped. Current optimizations do consider memory bandwidth computing power; therefore, they can only achieve sub-optimal performance. In this paper, we propose two efficient algorithms -- TSM2R TSM2L for classes of tall-and-skinny matrix-matrix multiplications GPUs. Both them operation at least one matrices tall-and-skinny. Specifically, designed a large matrix multiplying matrix, while small matrix. We implement our proposed test several modern NVIDIA micro-architectures. Experiments show that, compared to current state-of-the-art works, (1) speeds up computation by 1.1x~3x improves utilization power 8%~47.6% 7%~37.3%, respectively, size relatively or medium; (2) 1.1x~3.5x improve 55% small.
منابع مشابه
High Performance Relevance Vector Machine on GPUs
The Relevance Vector Machine (RVM) algorithm has been widely utilized in many applications, such as machine learning, image pattern recognition, and compressed sensing. However, the RVM algorithm is computationally expensive. We seek to accelerate the RVM algorithm computation for time sensitive applications by utilizing massively parallel accelerators such as GPUs. In this paper, the computati...
متن کاملBitsliced High-Performance AES-ECB on GPUs
In order to perform high-performance Monte Carlo simulations of fracture in certain composite materials, we needed fast methods for generating deterministic random numbers. We made several design choices, and due to the fact that the entire simulation was to be done on both CPUs and GPUs, we designed new methods for fast implementation of the AES in the ECB mode on such architectures. This pape...
متن کاملMultiple Precision Integer Multiplication on GPUs
This paper addresses multiple precision integer multiplication on GPUs. In this paper, we propose a novel data-structure named a product digit table and present a GPU algorithm to perform the multiplication with the product digit table. Experimental results on a 3.10 GHz Intel Core i3-2100 CPU and an NVIDIA GeForce GTX480 GPU show that the proposed GPU algorithm respectively runs over 71.4 time...
متن کاملOptimizing Sparse Matrix-Vector Multiplication on GPUs
We are witnessing the emergence of Graphics Processor units (GPUs) as powerful massively parallel systems. Furthermore, the introduction of new APIs for general-purpose computations on GPUs, namely CUDA from NVIDIA, Stream SDK from AMD, and OpenCL, makes GPUs an attractive choice for high-performance numerical and scientific computing. Sparse Matrix-Vector multiplication (SpMV) is one of the mo...
متن کاملHigh-Performance Matrix Multiplication
This document describes techniques for speeding up matrix multiplication on some high-performance computer architectures, including the IBM RS-6000, the IBM 3090/600S-VF, the MIPS RC3240 and RC6280, the Stardent 3040, and the Sun SPARCstation. The methods illustrate general principles that can be applied to the inner loops of scientific code.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Parallel and Distributed Computing
سال: 2021
ISSN: ['1096-0848', '0743-7315']
DOI: https://doi.org/10.1016/j.jpdc.2021.02.013